Search CORE

55 research outputs found

Recovering implicit pitch contours from formants in whispered speech

Author: Malisz Zofia
Zarazaga Pablo Pérez
Publication venue
Publication date: 06/07/2023
Field of study

Whispered speech is characterised by a noise-like excitation that results in the lack of fundamental frequency. Considering that prosodic phenomena such as intonation are perceived through f0 variation, the perception of whispered prosody is relatively difficult. At the same time, studies have shown that speakers do attempt to produce intonation when whispering and that prosodic variability is being transmitted, suggesting that intonation "survives" in whispered formant structure. In this paper, we aim to estimate the way in which formant contours correlate with an "implicit" pitch contour in whisper, using a machine learning model. We propose a two-step method: using a parallel corpus, we first transform the whispered formants into their phonated equivalents using a denoising autoencoder. We then analyse the formant contours to predict phonated pitch contour variation. We observe that our method is effective in establishing a relationship between whispered and phonated formants and in uncovering implicit pitch contours in whisper.Comment: 5 pages, 3 figures, 2 tables, Accepted at ICPhS 202

arXiv.org e-Print Archive

Voicing in Polish: interactions with lexical stress and focus

Author: Malisz Zofia
Żygis Marzena
Publication venue
Publication date: 01/01/2015
Field of study

Malisz Z, Żygis M. Voicing in Polish: interactions with lexical stress and focus. In: 18th International Congress of Phonetic Sciences. Glasgow; In Press.We examine the dynamics of VOT in Polish stops under lexical stress and focus. We elicit real Polish words containing voiced and voiceless stop+/a/ syllables in primary, secondary and unstressed, as well as focus positions. We also correlate VOT with speech rate estimated on the basis of equisyllabic word length. Our results show that the relationships between prosody and VOT are consistent with the status of Polish as a true voicing language

Publications at Bielefeld University

Speaker-independent neural formant synthesis

Author: Henter Gustav Eje
Juvela Lauri
Malisz Zofia
Zarazaga Pablo Pérez
Publication venue
Publication date: 02/06/2023
Field of study

We describe speaker-independent speech synthesis driven by a small set of phonetically meaningful speech parameters such as formant frequencies. The intention is to leverage deep-learning advances to provide a highly realistic signal generator that includes control affordances required for stimulus creation in the speech sciences. Our approach turns input speech parameters into predicted mel-spectrograms, which are rendered into waveforms by a pre-trained neural vocoder. Experiments with WaveNet and HiFi-GAN confirm that the method achieves our goals of accurate control over speech parameters combined with high perceptual audio quality. We also find that the small set of phonetically relevant speech parameters we use is sufficient to allow for speaker-independent synthesis (a.k.a. universal vocoding).Comment: 5 pages, 4 figures. Article accepted at INTERSPEECH 202

arXiv.org e-Print Archive

Acoustic-phonetic realisation of Polish syllable prominence: a corpus study.

Author: Campbell Nick
Gibbon Dafydd
Hirst Daniel
Malisz Zofia
Wagner Petra
Publication venue
Publication date: 01/01/2012
Field of study

Malisz Z, Wagner P. Acoustic-phonetic realisation of Polish syllable prominence: a corpus study. In: Gibbon D, Hirst D, Campbell N, eds. Rhythm, melody and harmony in speech. Studies in honour of Wiktor Jassem. Speech and Language Technology. Vol 14/15. Poznań, Poland; 2012: 105-114

Publications at Bielefeld University

Recording and transcription of speech and gesture in the narration of Polish adults and children

Author: Jarmołowicz-Nowikow Ewa
Juszczyk Konrad
Karpiński Maciej
Malisz Zofia
Szczyszek Michał
Publication venue: 'Adam Mickiewicz University Poznan'
Publication date: 15/12/2008
Field of study

In the present paper, the experimental procedure, the details of sound and video recording set-up as well as the system for speech and gesture transciption and coding used in the Polish Cartoon Narration Corpus (PCNC) project are described. The audio-visual data come from a cartoon narration task performed by both children and adults. The recordings are transcribed orthographically and phonemically, and labelled for selected phenomena on a number of levels, including gesture, lexicon, prosody, and dialogue acts.In the present paper, the experimental procedure, the details of sound and video recording set-up as well as the system for speech and gesture transciption and coding used in the Polish Cartoon Narration Corpus (PCNC) project are described. The audio-visual data come from a cartoon narration task performed by both children and adults. The recordings are transcribed orthographically and phonemically, and labelled for selected phenomena on a number of levels, including gesture, lexicon, prosody, and dialogue acts

Biblioteka Nauki - repozytorium artykuÅÃ³w

Investigationes Linguisticae

Micro-timing of backchannels in human-robot interaction

Author: Cakmak Maya
Chao Crystal
Hoffman Guy
Inden Benjamin
Malisz Zofia
Wachsmuth Ipke
Wagner Petra
Publication venue
Publication date: 01/01/2014
Field of study

Inden B, Malisz Z, Wagner P, Wachsmuth I. Micro-timing of backchannels in human-robot interaction. Presented at the Timing in Human-Robot Interaction: Workshop in Conjunction with the 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI2014), Bielefeld, Germany

Publications at Bielefeld University

Modern speech synthesis for phonetic sciences: a discussion and an evaluation

Author: Beskow Jonas
Eje Henter Gustav
Gustafson Joakim
Malisz Zofia
Valentini Botinhao Cassia
Watts Oliver
Publication venue
Publication date: 01/01/2019
Field of study

QC 20191112</p

Publikationer från KTH

Edinburgh Research Explorer

Digitala Vetenskapliga Arkivet - Academic Archive On-line

'Ja, mhm, ich verstehe dich' - Oszillator-basiertes Timing multimodaler Feedback-Signale in spontanen Dialogen

Author: Inden Benjamin
Malisz Zofia
Wachsmuth Ipke
Wagner Petra
Wolff Matthias
Publication venue: TUD Press
Publication date: 01/01/2012
Field of study

Wagner P, Inden B, Malisz Z, Wachsmuth I. 'Ja, mhm, ich verstehe dich' - Oszillator-basiertes Timing multimodaler Feedback-Signale in spontanen Dialogen. In: Wolff M, ed. Elektronische Sprachsignalverarbeitung 2012 (Tagungsband ESSV) --- Studientexte zur Sprachkommunikation. Vol 64. Dresden: TUD Press; 2012: 179-187

Publications at Bielefeld University

'Are you sure you're paying attention?' – 'Uh-huh'. Communicating understanding as a marker of attentiveness

Author: Buschmeier Hendrik
Kopp Stefan
Malisz Zofia
Wagner Petra
Wlodarczak Marcin
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2011
Field of study

Buschmeier H, Malisz Z, Wlodarczak M, Kopp S, Wagner P. 'Are you sure you're paying attention?' – 'Uh-huh'. Communicating understanding as a marker of attentiveness. In: Proceedings of INTERSPEECH 2011. International Speech Communication Association; 2011: 2057-2060.We report on the ﬁrst results of an experiment designed to investigate properties of communicative feedback produced by non-attentive listeners in dialogue. Listeners were found to produce less feedback when distracted by an ancillary task. A decreased number of feedback expressions communicating understanding was a particularly reliable indicator of distractedness. We argue this ﬁnding could be used to facilitate recognition of attentional states in dialogue system users. Index Terms: communicative feedback; dialogue; distraction; engagement; attention; dual tas

Publications at Bielefeld University

Dimensions of Segmental Variability: Interaction of Prosody and Surprisal in Six Languages

Author: Bernd Möbius
Bistra Andreeva
Erika Brandt
Yoon Mi Oh
Zofia Malisz
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

Contextual predictability variation affects phonological and phonetic structure. Reduction and expansion of acoustic-phonetic features is also characteristic of prosodic variability. In this study, we assess the impact of surprisal and prosodic structure on phonetic encoding, both independently of each other and in interaction. We model segmental duration, vowel space size and spectral characteristics of vowels and consonants as a function of surprisal as well as of syllable prominence, phrase boundary, and speech rate. Correlates of phonetic encoding density are extracted from a subset of the BonnTempo corpus for six languages: American English, Czech, Finnish, French, German, and Polish. Surprisal is estimated from segmental n-gram language models trained on large text corpora. Our findings are generally compatible with a weak version of Aylett and Turk's Smooth Signal Redundancy hypothesis, suggesting that prosodic structure mediates between the requirements of efficient communication and the speech signal. However, this mediation is not perfect, as we found evidence for additional, direct effects of changes in surprisal on the phonetic structure of utterances. These effects appear to be stable across different speech rates

Directory of Open Access Journals

Frontiers - Publisher Connector